Hypothesis Testing for High-dimensional Sparse Binary Regression.

نویسندگان

  • Rajarshi Mukherjee
  • Natesh S Pillai
  • Xihong Lin
چکیده

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

A Flexible Framework for Hypothesis Testing in High-dimensions

Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples (p > n) and assume that the high-dimensional parameters vector is s0 sparse. We develop a general and flexible `∞ projection statistic for hypothesis testing in this model. Our framework ...

متن کامل

High-dimensional classification by sparse logistic regression

We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic bounds for the resulting misclassification excess risk. The bounds can be reduced under the additional low-noise condition. The proposed complexity penalty ...

متن کامل

Minimax risks for sparse regressions: Ultra-high dimensional phenomenons

Abstract: Consider the standard Gaussian linear regression model Y = Xθ0 + ǫ, where Y ∈ R is a response vector and X ∈ R is a design matrix. Numerous work have been devoted to building efficient estimators of θ0 when p is much larger than n. In such a situation, a classical approach amounts to assume that θ0 is approximately sparse. This paper studies the minimax risks of estimation and testing...

متن کامل

Two-sample testing in high dimensions

We propose new methodology for two-sample testing in high dimensional models. The methodology provides a high dimensional analogue to the classical likelihood ratio test and is applicable to essentially any model class where sparse estimation is feasible.Sparse structure is used in the construction of the test statistic. In the general case, testing then involves nonnested model comparison, and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Annals of statistics

دوره 43 1  شماره 

صفحات  -

تاریخ انتشار 2015